Skip to content

JMCP: Papers - jmcp/use-structured-output-for-tag-enrichment-eu897mic#34

Merged
landigf merged 1 commit intomainfrom
jmcp/use-structured-output-for-tag-enrichment-eu897mic
Mar 25, 2026
Merged

JMCP: Papers - jmcp/use-structured-output-for-tag-enrichment-eu897mic#34
landigf merged 1 commit intomainfrom
jmcp/use-structured-output-for-tag-enrichment-eu897mic

Conversation

@landigf
Copy link
Copy Markdown
Owner

@landigf landigf commented Mar 25, 2026

Review: Use structured output for tag enrichment

Change summary

Replaces free-form provider.complete("tag-extraction", …) with a dedicated provider.extractTags() method that uses the xAI/Grok Responses API json_schema structured output mode. The contract layer gains a Zod schema (tagExtractionResultSchema) and a mirrored JSON Schema (tagExtractionJsonSchema) to validate the LLM response at both the API constraint level and at parse time.

Files touched: 3 — contracts/src/index.ts (new schemas), ai/src/index.ts (new method), web/trigger/tasks.ts (call-site simplified).

Validation confidence: High

  • All tests pass (7/7 packages)
  • Type-check clean across the monorepo (including Next.js build)
  • Zod parse after JSON.parse provides a runtime safety net even if the model drifts
  • .max(8) on the Zod array matches the system prompt instruction ("up to 8")

Remaining risks

  1. Dual-schema drift — The Zod schema and the hand-written JSON Schema are defined independently. If one is updated and the other isn't, the API constraint and runtime validation will diverge. Consider generating one from the other (e.g. zod-to-json-schema).
  2. No maxItems in the JSON Schema — The Zod side enforces .max(8), but the JSON Schema sent to the model doesn't include "maxItems": 8, so the model could return >8 tags that then fail Zod parsing at runtime.
  3. JSON.parse on missing output_text — Falls back to "{}" which parses to {}, then Zod would accept { tags: undefined } … except .array() won't coerce undefined. This will throw a ZodError rather than returning { tags: [] } like the disabled-provider path does. A graceful fallback or explicit check might be safer.
  4. No timeout / retry — The fetch call has no AbortSignal timeout. A hung upstream will block the Trigger task indefinitely.
  5. Slug format not validated — The slug field is z.string() with no regex constraint for URL-safety, relying entirely on the model to obey the system prompt.

None of these are blockers — the change is a clear improvement over the unstructured path. Items 2 and 3 are the most actionable quick fixes.

@landigf landigf merged commit de981f1 into main Mar 25, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant